Statistical Learning Techniques for Costing XML Queries
نویسندگان
چکیده
Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called Comet, to modeling the cost of XML operators; to our knowledge, Comet is the first method ever proposed for addressing the XML query costing problem. As in relational cost estimation, Comet exploits a set of system catalog statistics that summarizes the XML data; the set of “simple path” statistics that we propose is new, and is well suited to the XML setting. Unlike the traditional approach, Comet uses a new statistical learning technique called “transform regression” instead of detailed analytical models to predict the overall cost. Besides rendering the cost estimation problem tractable for XML queries, Comet has the further advantage of enabling the query optimizer to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment. We demonstrate Comet’s feasibility by developing a cost model for the recently proposed XNav navigational operator. Empirical studies with synthetic, benchmark, and real-world data sets show that Comet can quickly obtain accurate cost estimates for a variety of XML queries and data sets.
منابع مشابه
Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملAnalysis and design of approximate queries over XML documents using statistical techniques
In the last few years several repositories for storing XML documents and languages for querying XML data have been studied and implemented. All the query languages proposed so far allow to obtain exact answers, but when applied to large XML repositories or warehouses, such precise queries may require high response times. To overcome this problem, in traditional relational warehouses fast approx...
متن کاملLearning n-ary tree-pattern queries for web information extraction
The problem of extracting information from the Web consists in building patterns allowing to extract specific information from documents of a given Web source. Up to now, most existing techniques use string-based representations of documents as well as string-based patterns. Using tree representations naturally allows to overcome limitations of string-based approaches. While some tree-based app...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کامل